Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Thread-)Parallelize bounds check routine for subcell IDP limiting #1736

Merged

Conversation

bennibolm
Copy link
Contributor

@bennibolm bennibolm commented Nov 14, 2023

For larger simulations, the bounds check routine for IDP limiting requires more and more time since it was not parallelized. After this PR, the loop over all elements is thread-parallelized.
For that, the deviation memory had to be changed:
I divided it into its global idp_bounds_delta_global and local component idp_bounds_delta_local - both are Dictionaries for the respective variable bounds.

  • idp_bounds_delta_local (containing the maximum deviations in the current timestep interval) is now a vector and therefore parallel-safe. Due to false sharing we extend the vector and use a stride size.

  • idp_bounds_delta_global (containing the global maximum deviations) doesn't need to be thread-safe, since it only uses the already parallel-computed result of the local maximum deviation.

Additionally, this PR parallelizes resetting the subcell limiting coefficients alpha.

Copy link
Contributor

github-actions bot commented Nov 14, 2023

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (fcf2652) 96.34% compared to head (9c66099) 96.34%.

Files Patch % Lines
src/callbacks_stage/subcell_bounds_check_2d.jl 96.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1736      +/-   ##
==========================================
- Coverage   96.34%   96.34%   -0.00%     
==========================================
  Files         451      451              
  Lines       35979    35996      +17     
==========================================
+ Hits        34662    34677      +15     
- Misses       1317     1319       +2     
Flag Coverage Δ
unittests 96.34% <97.62%> (-<0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DanielDoehring DanielDoehring added performance We are greedy parallelization Related to MPI, threading, tasks etc. labels Nov 15, 2023
@bennibolm
Copy link
Contributor Author

bennibolm commented Nov 20, 2023

EDIT: Outdated structure

The current thread implementation is not the cleanest.

    # Threaded memory for bounds checking routine with `BoundsCheckCallback`.
    # The first entry of each vector contains the maximum deviation since the last export.
    # In the second entry, the total maximum deviation is saved.
    idp_bounds_delta_threaded = Dict{Symbol, Vector{Vector{real(basis)}}}()
    for key in bound_keys
        idp_bounds_delta_threaded[key] = [zeros(real(basis), 2)
                                          for _ in 1:Threads.nthreads()]
    end

First, the "Dictionary-Layer" has to be first, since the call of Symbol("...") always creates a small amount of allocations.
I'm aware that the memory for the global maximum deviation for threads 2 to nthreads() aren't used right now. Maybe there is another way to implement this thread parallel bounds check.
(Idea: Use Dictionary with vectors of length nthreads() + 1. Use the first nthreads() entries are used to calculate the current deviations in parallel. Then, use last entry for global deviation, which is not needed to be parallel. This would save memory, but probably would confuse more....)

EDIT: Outdated structure

Copy link
Member

@efaulhaber efaulhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this data structure, but I also don't have any better ideas.

src/callbacks_stage/subcell_bounds_check.jl Outdated Show resolved Hide resolved
@bennibolm
Copy link
Contributor Author

In the last commit (78957b7), I revised the memory structure for the IDP bounds check.
For a better understanding, the local and global component of the memory are now divided into 2 different variable. The local component needed to be thread-parallel-safe and is therefore initialized as a vector of length nthreads(). On the other hand, this was not required for the global maximum deviation component.

@bennibolm bennibolm marked this pull request as ready for review December 20, 2023 16:07
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment left, then go go go!

Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@sloede sloede enabled auto-merge (squash) January 30, 2024 09:20
@sloede sloede merged commit f4e6e49 into trixi-framework:main Jan 30, 2024
34 of 35 checks passed
@bennibolm bennibolm deleted the subcell-limiting-parallel-bounds-check branch January 30, 2024 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelization Related to MPI, threading, tasks etc. performance We are greedy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants